-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(frontend): support idle in transaction session timeout #14566
feat(frontend): support idle in transaction session timeout #14566
Conversation
IMO the major motivation of #13940 is avoiding "pinned Hummock epoch to last forever", which requires proactively timing and killing the transaction once it timed out. Can this PR achieve that? |
I would like to improve it. Currently, this PR needs another statement to trigger the abortion of that session/transaction. |
src/utils/pgwire/src/pg_protocol.rs
Outdated
@@ -550,6 +550,12 @@ where | |||
record_sql_in_span(&sql); | |||
let session = self.session.clone().unwrap(); | |||
|
|||
if session.check_idle_in_transaction_timeout() { | |||
self.process_terminate(); | |||
return Err(PsqlError::SimpleQueryError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding a new error variant? I believe the handling logic can be quite similar to handling Panic
here.
risingwave/src/utils/pgwire/src/pg_protocol.rs
Lines 350 to 360 in 240416f
PsqlError::Panic(_) => { | |
self.stream | |
.write_no_flush(&BeMessage::ErrorResponse(Box::new(e))) | |
.ok()?; | |
let _ = self.stream.flush().await; | |
// Catching the panic during message processing may leave the session in an | |
// inconsistent state. We forcefully close the connection (then end the | |
// session) here for safety. | |
return None; | |
} |
I assume we can start a background task to monitor the timeout when a transaction is started. |
Yeah, that's one approach. In my mind, we can set up a timer background task for each transaction. In either way, I am afraid this implementation can't be compatible. |
src/frontend/src/session.rs
Outdated
// Idle transaction background monitor | ||
let join_handle = tokio::spawn(async move { | ||
let mut check_idle_txn_interval = | ||
tokio::time::interval(core::time::Duration::from_secs(5)); | ||
check_idle_txn_interval.set_missed_tick_behavior(tokio::time::MissedTickBehavior::Skip); | ||
check_idle_txn_interval.reset(); | ||
loop { | ||
check_idle_txn_interval.tick().await; | ||
sessions.read().values().for_each(|session| { | ||
let _ = session.check_idle_in_transaction_timeout(); | ||
}) | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a background monitor to unpin snapshots proactively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little bit brute-force 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's okay to spawn many tasks in tokio. So what about one background task for one session? 🥲 Just remember to cancel it when session gets dropped, or simply use a weak reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a task for each session seems much more heavy 🥵. This check could be triggered in a longer period and much more friendly to high concurrent workload I think.
Will review it the next day. :) |
src/frontend/src/session.rs
Outdated
// Idle transaction background monitor | ||
let join_handle = tokio::spawn(async move { | ||
let mut check_idle_txn_interval = | ||
tokio::time::interval(core::time::Duration::from_secs(5)); | ||
check_idle_txn_interval.set_missed_tick_behavior(tokio::time::MissedTickBehavior::Skip); | ||
check_idle_txn_interval.reset(); | ||
loop { | ||
check_idle_txn_interval.tick().await; | ||
sessions.read().values().for_each(|session| { | ||
let _ = session.check_idle_in_transaction_timeout(); | ||
}) | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little bit brute-force 🤣
@@ -132,6 +138,12 @@ impl ExecContextGuard { | |||
} | |||
} | |||
|
|||
impl Drop for ExecContext { | |||
fn drop(&mut self) { | |||
*self.last_idle_instant.lock() = Some(Instant::now()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the last_idle_instant
is only updated when execution completes, if a batch query runs for a long time e.g. scanning a large amount of data, will it be killed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the
last_idle_instant
is only updated when execution completes, if a batch query runs for a long time e.g. scanning a large amount of data, will it be killed?
According to the postgresql docs, this variable doesn't care about how long a statement is running, which should be handled by parameters like statement timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From PG's documents:
Terminate any session that has been idle (that is, waiting for a client query) within an open transaction for longer than the specified amount of time.
IIUC, that means a running batch query should not be killed. Do we behave at the same way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we won't kill any query, but unpin the snapshot (either by background monitor or session termination triggered by a later statement). The session would be terminated when the later statement arrives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but unpin the snapshot
By "killed" I mean any means that cause the query to end. When a snapshot is unpinned, it's possible that the (running) iterator runs into errors and terminates, is it the cases? If so, I was saying that we should avoid such thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just ensure there is no running SQL in the current session during calling unpin_snapshot
in check_idle_in_transaction_timeout
, so we don't need to worry about that.
Oops, this seems to be the behavior of Postgres. 😂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
idle_in_transaction_session_timeout
and its default value is 60000(ms) i.e. 1min.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.